307 research outputs found

    Probabilistic analysis of the human transcriptome with side information

    Get PDF
    Understanding functional organization of genetic information is a major challenge in modern biology. Following the initial publication of the human genome sequence in 2001, advances in high-throughput measurement technologies and efficient sharing of research material through community databases have opened up new views to the study of living organisms and the structure of life. In this thesis, novel computational strategies have been developed to investigate a key functional layer of genetic information, the human transcriptome, which regulates the function of living cells through protein synthesis. The key contributions of the thesis are general exploratory tools for high-throughput data analysis that have provided new insights to cell-biological networks, cancer mechanisms and other aspects of genome function. A central challenge in functional genomics is that high-dimensional genomic observations are associated with high levels of complex and largely unknown sources of variation. By combining statistical evidence across multiple measurement sources and the wealth of background information in genomic data repositories it has been possible to solve some the uncertainties associated with individual observations and to identify functional mechanisms that could not be detected based on individual measurement sources. Statistical learning and probabilistic models provide a natural framework for such modeling tasks. Open source implementations of the key methodological contributions have been released to facilitate further adoption of the developed methods by the research community.Comment: Doctoral thesis. 103 pages, 11 figure

    A Quantitative Study of History in the English Short-Title Catalogue (ESTC), 1470-1800

    Get PDF
    This article analyses publication trends in the field of history in early modern Britain and North America in 1470–1800, based on English Short- Title Catalogue (ESTC) data. Its major contribution is to demonstrate the potential of digitized library catalogues as an essential scholastic tool and part of reproducible research. We also introduce a novel way of quantitatively analysing a particular trend in book production, namely the publishing of works in the field of history. The study is also our first experimental analysis of paper consumption in early modern book production, and dem- onstrates in practice the importance of open-science principles for library and information science. Three main research questions are addressed: 1) who wrote history; 2) where history was published; and 3) how publishing changed over time in early modern Britain and North America. In terms of our main findings we demonstrate that the average book size of history publications decreased over time, and that the octavo-sized book was the rising star in the eighteenth century, which is a true indication of expand- ing audiences. The article also compares different aspects of the most popu- lar writers on history, such as Edmund Burke and David Hume. Although focusing on history, these findings may reflect more widespread publishing trends in the early modern era. We show how some of the key questions in this field can be addressed through the quantitative analysis of large-scale bibliographic data collections.Peer reviewe

    FdeSolver: A Julia Package for Solving Fractional Differential Equations

    Full text link
    Implementing and executing numerical algorithms to solve fractional differential equations has been less straightforward than using their integer-order counterparts, posing challenges for practitioners who wish to incorporate fractional calculus in applied case studies. Hence, we created an open-source Julia package, FdeSolver, that provides numerical solutions for fractional-order differential equations based on product-integration rules, predictor-corrector algorithms, and the Newton-Raphson method. The package covers solutions for one-dimensional equations with orders of positive real numbers. For high-dimensional systems, the orders of positive real numbers are limited to less than (and equal to) one. Incommensurate derivatives are allowed and defined in the Caputo sense. Here, we summarize the implementation for a representative class of problems, provide comparisons with available alternatives in Julia and Matlab, describe our adherence to good practices in open research software development, and demonstrate the practical performance of the methods in two applications; we show how to simulate microbial community dynamics and model the spread of Covid-19 by fitting the order of derivatives based on epidemiological observations. Overall, these results highlight the efficiency, reliability, and practicality of the FdeSolver Julia package

    Ebola epidemic model with dynamic population and memory

    Get PDF
    The recent outbreaks of Ebola encourage researchers to develop mathematical models for simulating the dynamics of Ebola transmission. We continue the study of the models focusing on those with a variable population. Hence, this paper presents a compartmental model consisting of 8-dimensional nonlinear dif- ferential equations with a dynamic population and investigates its basic reproduction number. Moreover, a dimensionless model is introduced for numerical analysis, thus proving the disease-free equilibrium is locally asymptotically stable whenever the threshold condition, known as a basic reproduction number, is less than one. Finally, we use a fractional differential form of the model to sufficiently fit long time-series data of Guinea, Liberia, and Sierra Leone retrieved from the World Health Organization, and the numerical results demonstrate the performance of the model.publishe

    Best practices in bibliographic data science

    Get PDF
    Peer reviewe

    Dependency detection with similarity constraints

    Full text link
    Unsupervised two-view learning, or detection of dependencies between two paired data sets, is typically done by some variant of canonical correlation analysis (CCA). CCA searches for a linear projection for each view, such that the correlations between the projections are maximized. The solution is invariant to any linear transformation of either or both of the views; for tasks with small sample size such flexibility implies overfitting, which is even worse for more flexible nonparametric or kernel-based dependency discovery methods. We develop variants which reduce the degrees of freedom by assuming constraints on similarity of the projections in the two views. A particular example is provided by a cancer gene discovery application where chromosomal distance affects the dependencies between gene copy number and activity levels. Similarity constraints are shown to improve detection performance of known cancer genes.Comment: 9 pages, 3 figures. Appeared in proceedings of the 2009 IEEE International Workshop on Machine Learning for Signal Processing XIX (MLSP'09). Implementation of the method available at http://bioconductor.org/packages/devel/bioc/html/pint.htm
    • …
    corecore